{
 "metadata": {
  "name": "",
  "signature": "sha256:2b600cbd558eb3b85fe66409a81ab50c3ab3ebbc91179ae09bbb23ac24077d7c"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Maximum Likelihood Model Parameter Estimation"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\\begin{equation*}\n",
      "D = \\text{data} \\\\\n",
      "\\theta = \\text{model parameters} \\\\\n",
      "y(\\theta, x) = \\text{model}\n",
      "\\end{equation*}\n",
      "\n",
      "A common parameter-estimation strategy is to minimize error,\n",
      "\n",
      "\\begin{equation*}\n",
      "\\sum_{i}(D_i - y_i(\\theta, x))^2\n",
      "\\end{equation*}\n",
      "\n",
      "E.g.\n",
      "\n",
      "Create some phony straight line data with $\\theta=(a, b)$ and normally distributed noise in $y$,"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%pylab inline\n",
      "import numpy as np\n",
      "import matplotlib.pyplot as plt\n",
      "\n",
      "n_points = 10\n",
      "x = np.linspace(0,10,n_points)\n",
      "def y(x, theta=(2, 4), noise=0):\n",
      "    return  theta[0] * x + theta[1] + noise*np.random.randn(n_points)\n",
      "D = y(x, noise=1)\n",
      "plt.scatter(x,D)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We are trying to discover parameters, $\\theta$, so write error in terms of only $theta$,\n",
      "\n",
      "$SSE(\\theta; y(x), D)$"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# cheat, do this with global scope of y, x and D\n",
      "def ssq_err(theta):\n",
      "    return np.sum((D - y(x,theta))*(D - y(x,theta)).transpose())\n",
      "\n",
      "from scipy.optimize import minimize\n",
      "fit = minimize(ssq_err,[1,1])\n",
      "print fit\n",
      "plt.scatter(x, D)\n",
      "plt.plot(x,y(x, fit.x))\n",
      "print fit"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Not amazing so far $\\ldots$\n",
      "\n",
      "__Leading quesiton__:  Is normally distributed noise \"unusually compatible\" with least squares errors minimization?\n",
      "\n",
      "Maybe we should do something that takes into account the expected distibution of the model errors? What if the errors are distributed by Poisson, for example, does sum-squared-error still make sense?"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.stats import norm, poisson\n",
      "rvn, rvp = norm(5,1), poisson(5)\n",
      "x = np.linspace(-2,10,140)\n",
      "plot(x, rvn.pdf(x))\n",
      "xd = np.arange(0, np.minimum(rvp.dist.b, 15))\n",
      "vlines(xd, 0, rvp.pmf(xd), lw=2, color=\"red\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "__New Idea__:  The parameter fitting problem can be framed as \"Can we find the mostly likely value of the parameters $\\theta$ of the model given the data?\"\n",
      "\n",
      "i.e. maximize $P(\\theta|D)$"
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Frequentist Statistics"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Significance is specified by probability that\n",
      "the data was drawn from the \u201cwrong\u201d\n",
      "model (null hypothesis): p value\n",
      "\n",
      "* Not the probability a model is correct\n",
      "* Only random variables have probabilities\n",
      "\n",
      "That is, from a frequentist perspective, the PDF for normal noise is in terms of $P(D|\\theta)$,\n",
      "\n",
      "E.g.\n",
      "\\begin{equation*}\n",
      "P(D|\\theta) \\propto exp(\\sum_i -\\frac{(D_i - y_i(\\theta))^2}{2\\sigma^2})\n",
      "\\end{equation*}\n",
      "\n",
      "We need to flip this around."
     ]
    },
    {
     "cell_type": "heading",
     "level": 3,
     "metadata": {},
     "source": [
      "Bayesian inference"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "* Frequentist: Probability is a frequency\n",
      "(of a random variable)\n",
      "* Bayesian: Probability is a measure of\n",
      "belief or certainty"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Some statistics Rules\n",
      "\n",
      "\\begin{equation}\n",
      "P(A,B) = P(A|B)P(B) = P(B|A)P(A) \\\\\n",
      "P(A) = \\int_\\Omega P(A,B)dB = \\int_\\Omega P(A|B)P(B)dB \\\\\n",
      "\\end{equation}\n",
      "\n",
      "Bayesian statistics is the application of these rules"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Back to data and parameters:\n",
      "\n",
      "\\begin{equation*}\n",
      "P(\\theta|D) = \\frac{P(D|\\theta)P(\\theta)}{P(D)}\n",
      "\\end{equation*}\n",
      "\n",
      "\\begin{equation*}\n",
      "P(D|\\theta) : \\text{Likelihood} \\\\\n",
      "P(\\theta) : \\text{Prior} \\\\\n",
      "P(D) : \\text{Evidence} \\\\\n",
      "P(\\theta|D) : \\text{Posterier} \\\\\n",
      "\\end{equation*}\n",
      "\n",
      "\\begin{equation*}\n",
      "P(D) = \\int_\\Omega P(D|\\theta)P(\\theta)d\\theta \\\\\n",
      "\\end{equation*}\n",
      "\n",
      "Integration over the parameter space of the model."
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Simple Discrete Model: Coin Toss"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "In 10 trials, we get 7 heads: What's our best estimate of the probability of flipping heads with this coin?  \n",
      "\n",
      "The distribution of coin toss outcomes is the Binomial Distribution.\n",
      "\n",
      "$pmf(y|t,w) = \\binom{n}{k} p^k (1-p)^{(n-k)}$"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Let's not forget how obvious this is and guess 0.7."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from scipy.stats import binom\n",
      "rv = binom(10, 0.7)\n",
      "x = np.arange(0, np.minimum(rv.dist.b, 10))\n",
      "h = plt.vlines(x, 0, rv.pmf(x), lw=2, color=\"orange\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\"Speci\ufb01cally, the PDF in Fig. 1 is a function of the data given a particular set of parameter values, de\ufb01ned on the data scale. On the other hand, the likelihood function is a function of the parameter given a particular set of observed data, de\ufb01ned on the\n",
      "parameter scale. In short, Fig. 1 tells us the probability of a particular data value for a \ufb01xed parameter, whereas Fig. 2 tells us the likelihood (\u2018\u2018unnormalized probability\u2019\u2019) of a particular parameter value for a \ufb01xed data set.\" (.J. Myung / Journal of Mathematical Psychology 47 (2003) 90\u2013100)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "def f(w):\n",
      "    rv = binom(10, w)\n",
      "    return rv.pmf(7)\n",
      "# Hold the data constant, vary the parameter\n",
      "x = np.linspace(0,1,100)\n",
      "plot(x, f(x))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "It looks like the maximum likelihood on the data over the parameter is around 0.7?\n",
      "\n",
      "Our data could have been (6,10) or (8,10) with a 0.7 coin.  But we started by asking about the coin, to make any headway, we need to flip the question around."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "dat = [10, 7]\n",
      "def L_binom(w):\n",
      "    rv = binom(dat[0], w)\n",
      "    x = dat[1]\n",
      "    # -1 to make max problem a min problem\n",
      "    return -rv.pmf(x)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "x = np.linspace(0.1, 0.9, 9)\n",
      "scatter(x, L_binom(x))\n",
      "from scipy.optimize import minimize_scalar\n",
      "minimize_scalar(L_binom, bounds=[0,1], method=\"bounded\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Slightly More Complicated Model: Exponential with Normal Error"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "n_points = 10\n",
      "x = linspace(0,100,n_points)\n",
      "def y(x, theta=(10, 0.05), noise=0):\n",
      "    return  theta[0] * np.exp(-x * theta[1]) + noise*randn(n_points)\n",
      "D = y(x, noise=1)\n",
      "scatter(x,D)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Likelihood:\n",
      "    \n",
      "\\begin{equation}\n",
      "P(D|\\theta) \\propto \\prod_i {\\exp(-\\frac{(D_i - y_i(\\theta))^2}{2\\sigma^2})} \\propto \\prod_i {\\exp(-\\frac{(D_i - (10\\exp(-wx)))^2}{2})}\n",
      "\\end{equation}\n",
      "\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# use x, D from global scope\n",
      "def L_cost(w):\n",
      "    return np.prod(np.exp(((D - y(x,(10, w)))*(D - y(x,(10, w))).transpose())/2.))\n",
      "w = np.linspace(0,0.1,10)\n",
      "L_c = [L_cost(i) for i in w]\n",
      "scatter(w,L_c)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Numerically troubling!  Turn the product into a sum with log!"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# use x, D from global scope\n",
      "def L_cost(w):\n",
      "    return np.sum(((D - y(x,(10, w)))*(D - y(x,(10, w))).transpose())/2.)\n",
      "w = np.linspace(0,0.1,10)\n",
      "L_c = [L_cost(i) for i in w]\n",
      "scatter(w,L_c)\n",
      "from scipy.optimize import minimize_scalar\n",
      "minimize_scalar(L_cost, bounds=[0,0.1], method=\"bounded\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "So it might be useful to use the $\\log$ likelihood:\n",
      "\n",
      "\\begin{equation}\n",
      "\\log(P(D|\\theta)) \\propto \\sum_i \\log(exp(-\\frac{(D_i - (10\\exp(-wx)))^2}{2})) = - \\sum_i \\frac{(D_i - (10\\exp(-wx)))^2}{2}\n",
      "\\end{equation}\n",
      "\n",
      "__Normally distributed errors + Minimum-Log-Likelihood is the same solution as Lease Squares.__"
     ]
    },
    {
     "cell_type": "heading",
     "level": 2,
     "metadata": {},
     "source": [
      "Exponential Decay and Poisson Likelihood"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "\\begin{equation}\n",
      "\\log(P) \\propto \\sum_i \\log(\\frac{\\lambda^{x_i}e^{-\\lambda}}{\\Gamma(x_i)}) = \\sum_i (x_i) \\log\\lambda + (-\\lambda) - \\log\\Gamma(x_i)\n",
      "\\end{equation}\n",
      "\n",
      "Sterling $\\ldots$ or realize that $\\Gamma(x_i)$ is a constant w.r.t. MLE estimation of $\\lambda$.\n",
      "\n",
      "\n",
      "\\begin{equation}\n",
      "\\log(P(D|\\theta)) \\propto \\sum_i \\log(\\frac{y_i(\\theta))^{D_i}e^{-y_i(\\theta))}}{\\Gamma(D_i)}) = \\sum_i D_i \\log y_i(\\theta) -y_i(\\theta)\n",
      "\\end{equation}"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "from numpy.random import poisson\n",
      "n_points = 20\n",
      "x = linspace(0,300,n_points)\n",
      "def y(x, theta=(110, 0.03), noise=0):\n",
      "    ty = theta[0] * np.exp(-x * theta[1])\n",
      "    if noise != 0:\n",
      "        ty = [poisson(k)for k in ty] \n",
      "    return ty\n",
      "D = y(x, noise=5)\n",
      "scatter(x,D)\n",
      "plot(x,y(x))"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# use x, D from global scope\n",
      "def P_cost(*w):\n",
      "    return -np.sum(D*np.log(y(x,*w)) - y(x,*w))\n",
      "fit_ml = minimize(P_cost, (105,0.05))\n",
      "print fit_ml"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "# cheat, do this with global scope of y, x and D\n",
      "def ssq_err(*w):\n",
      "    return np.sum((D - y(x,*w))*(D - y(x,*w)).transpose())\n",
      "fit_ssq = minimize(ssq_err,[105,0.05])\n",
      "print fit_ssq\n",
      "scatter(x, D)\n",
      "plot(x,y(x,fit_ml.x), color=\"green\")\n",
      "plot(x,y(x,fit_ssq.x), color=\"red\")"
     ],
     "language": "python",
     "metadata": {},
     "outputs": []
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Need more?\n",
      "\n",
      "http://people.physics.anu.edu.au/~tas110/Teaching/Lectures/L3/Material/Myung03.pdf\n",
      "\n",
      "http://sciencehouse.files.wordpress.com/2013/07/gordonconf13.pdf\n",
      "\n",
      "http://www.math.utah.edu/~levin/M5080/mle.pdf\n"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [],
     "language": "python",
     "metadata": {},
     "outputs": []
    }
   ],
   "metadata": {}
  }
 ]
}